DEWS 2006 3 A - i 6 Finding Thai Web
نویسندگان
چکیده
While the Web has been increasingly recognized as a culturally valuable social artifact, many nations endeavor to create national Web archives for long term preservation. However, due to its borderless-ness, gathering information for a specific nation from the Web is challenging. This paper proposes language specific web crawling (LSWC) as a method of creating Web archives for countries with linguistic identities such as Thailand. The LSWC strategy for selectively gathering Thai web pages from virtually anywhere on the Web is derived based on static analyses of the Thai Web graph. Then, the LSWC strategy is evaluated on a crawling simulator with large dataset. Keyword , , Web , Web , , , Web
منابع مشابه
Blind Evaluation for Thai Search Engines
This paper compares the effectiveness of two different Thai search engines by using a blind evaluation. The probabilistic-based dictionary-less search engine is evaluated against the traditional word-based indexing method. The web documents from 12 Thai newspaper web sites consisting of 83,453 documents are used as the test collection. The relevance judgment is conducted on the first five retur...
متن کاملA Collaborative Framework for Collecting Thai Unknown Words from the Web
We propose a collaborative framework for collecting Thai unknown words found on Web pages over the Internet. Our main goal is to design and construct a Webbased system which allows a group of interested users to participate in constructing a Thai unknown-word open dictionary. The proposed framework provides supporting algorithms and tools for automatically identifying and extracting unknown wor...
متن کاملClassification of News Web Documents Based on Structural Features
The motivation of this work comes from the need of a Thai web corpus for testing our information retrieval algorithm. Two collections of news web documents are gathered from two different Thai newspaper web sites. Our goal is to find a simple yet effective method to extract news articles from these web collections. We explore the use of machine learning methods to distinguish article pages from...
متن کاملWeb Accessibility for Older Readers: Effects of Font Type and Font Size on Skim Reading Webpages in Thai
Most guidelines for making websites accessible for older people have been developed for the Latin alphabet. Currently, there are no web design guidelines for the Thai language or for Thai older people. Our research investigated the effect of font type and size in Thai on skim reading for Thai younger (21-39 years) and older (59-72 years) adults. There were two levels of font types (Conservative...
متن کاملSerum concentrations of Krebs von den Lungen-6, surfactant protein D, and matrix metalloproteinase-2 as diagnostic biomarkers in patients with asbestosis and silicosis: a case–control study
BACKGROUND Asbestosis and silicosis are progressive pneumoconioses characterized by interstitial fibrosis following exposure to asbestos or silica dust. We evaluated the potential diagnostic biomarkers for these diseases. METHODS The serum concentrations of Krebs von den Lungen-6 (KL-6), surfactant protein D (SP-D), and matrix metalloproteinase-2 (MMP-2), MMP-7, and MMP-9 were measured in 43 ...
متن کامل